The corresponding journal entry for this assignment is available on GitHub at this link: https://github.com/bcb420-2022/Jedid_Ahn/wiki/Entry-%2311:-Notes-from-lecture-for-A3





# Libraries to install here.
if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}

if (!requireNamespace("GEOquery", quietly = TRUE)) {
  BiocManager::install("GEOquery")
}

if (!requireNamespace("edgeR", quietly = TRUE)) {
  BiocManager::install("edgeR")
}

if (!requireNamespace("limma", quietly = TRUE)) {
  BiocManager::install("limma")
}

if (!requireNamespace("circlize", quietly = TRUE)) {
  install.packages("circlize")
}

if (!requireNamespace("ComplexHeatmap", quietly = TRUE)) {
  BiocManager::install("ComplexHeatmap")
}

if (!requireNamespace("gprofiler2", quietly = TRUE)) {
  install.packages("gprofiler2")
}

# Libraries to load here.
library(Biobase)
library(GEOquery)
library(edgeR)
library(limma)
library(circlize)
library(ComplexHeatmap)
library(gprofiler2)

# Figure count: Start at 1.
count <- 1



Introduction

GEO dataset

The GEO dataset that I chose was GSE66261, which examines the expression and functions of long noncoding RNAs during human T helper cell differentiation.

GEO <- "GSE66261"
count_folder_path <- paste0(getwd(), "/", GEO)

if (!dir.exists(count_folder_path)){
  count_file_path <- rownames(GEOquery::getGEOSuppFiles(GEO))[1]
} else{
  count_file_path <- list.files(path = count_folder_path, full.names = TRUE)[1]
}

GSE66261 <- read.delim(count_file_path, header = TRUE, check.names = FALSE)

To better understand the data, groups were defined by formatting according to library, cell type, and condition. There are 2 libraries: 2695 and 2960, 2 cell types: Primary and effector, and 3 conditions: TH1, TH2, and TH17.

listed_titles <- colnames(GSE66261)[3:14]
gsm_titles <- c("TH1_Primary_2695", "TH2_Primary_2695", "TH17_Primary_2695", "TH1_Effector_2695", "TH2_Effector_2695", "TH17_Effector_2695", "TH1_Primary_2960", "TH2_Primary_2960", "TH17_Primary_2960", "TH1_Effector_2960", "TH2_Effector_2960", "TH17_Effector_2960")
  
exp_names <- data.frame(listed_titles, gsm_titles)

samples <- data.frame(lapply(exp_names$gsm_titles, 
                             function(x) { rev(unlist(strsplit(x, split = "_"))) }))
colnames(samples) <- listed_titles
rownames(samples) <- c("library", "cell_type", "condition")
samples <- data.frame(t(samples))


Filtering

Before normalization, the dataset was filtered by removing genes with low counts. Using edgeR protocol, features without at least 1 read per million in n of the samples were removed. Since there are 6 samples of each group, n = 6.

n <- 6

# Translate out counts into counts per million using the edgeR package.
cpms = edgeR::cpm(GSE66261[, 3:14])
rownames(cpms) <- GSE66261[, 1]

# Get rid of low counts
keep = rowSums(cpms > 1) >= n
GSE66261_filtered = GSE66261[keep, ]
rownames(GSE66261) <- 1:nrow(GSE66261)


HUGO gene symbols

The original dataset already came with HUGO gene symbols, so no mapping of symbols was required. In addition, tests showed that nearly 90% of symbols lined up with the symbols obtained from biomaRt.

Tests showed that there were 18 rows in total that consists of mappings to the same symbol. Two rows were updated by replacing their original HUGO gene symbol with the mapped gene symbol, while the rest were dropped to avoid compromising normalization and downstream analysis.

any(GSE66261_filtered$Name == "RNY1P13")
## [1] FALSE
GSE66261_filtered$Name[which(GSE66261_filtered$Feature == "ENSG00000201900")] <- "RNY1P13"

any(GSE66261_filtered$Name == "STRA6LP")
## [1] FALSE
GSE66261_filtered$Name[which(GSE66261_filtered$Feature == "ENSG00000254876")] <- "STRA6LP"

remove = duplicated(GSE66261_filtered$Name)
GSE66261_filtered <- GSE66261_filtered[!remove, ]


Normalization

Trimmed means of M-value (TMM) was chosen as the normalization technique as it a specialized normalization technique for RNASeq datasets.

filtered_data_matrix <- as.matrix(GSE66261_filtered[, 3:14])
rownames(filtered_data_matrix) <- GSE66261_filtered$Name
d = edgeR::DGEList(counts = filtered_data_matrix, group = samples$cell_type)

d = edgeR::calcNormFactors(d)
normalized_counts <- as.data.frame(edgeR::cpm(d))
rownames(normalized_counts) <- GSE66261_filtered$Name


Through the MDS plot as shown in Figure 1, we can see that groupings are favourable as it is clustering according to condition rather than library or cell type.
Figure 1: We observe that TH1 and TH2 samples cluster all together as they are both involved in immune response, while TH17 samples cluster together separately from TH1 and TH2.

Figure 1: We observe that TH1 and TH2 samples cluster all together as they are both involved in immune response, while TH17 samples cluster together separately from TH1 and TH2.


By observing the boxplot as shown in Figure 2, we can visualize the distribution of the data.
Figure 2: The boxplot shows that there aren't significant differences in the mean for each sample after normalization.

Figure 2: The boxplot shows that there aren’t significant differences in the mean for each sample after normalization.


On the other hand, the density plot as shown in Figure 3 shows that the data follows a soft bimodal distribution.
Figure 3: The soft bimodal distribution is something to keep in mind when performing differential expression analysis.

Figure 3: The soft bimodal distribution is something to keep in mind when performing differential expression analysis.


Finally, we calculated dispersion as shown in Figure 4 to determine how much variance deviates from the mean.
Figure 4: The BCV plot confirms the trend of more gene expression leading to dispersion values being closer together, and less variation overall.

Figure 4: The BCV plot confirms the trend of more gene expression leading to dispersion values being closer together, and less variation overall.


This is our final dataset of normalized counts, which will be used for differential expression analysis.

normalized_counts



Differential expression analysis

Linear model

Before running differential expression analysis, we have to create a linear model in R by creating a design matrix. We will account for both library variability and the condition of interest, which is contributing to the differential expression. This is demonstrated by the MDS plot shown in Figure 1, where samples are clustering according to condition.

model_design <- model.matrix(~ samples$library + samples$condition)

Fit the normalized counts to the model design created.

expression_mat <- as.matrix(normalized_counts)
minimal_set <- Biobase::ExpressionSet(assayData = expression_mat)
fit <- limma::lmFit(minimal_set, model_design)


Computation and multiple hypothesis testing

Then, we use empirical Bayes to compute differential expression.

fit2 <- limma::eBayes(fit, trend = TRUE)

(Answer to Q2) The Benjamini Hochberg method was used as the paper associated with the experiment specified that false discovery rate was used to correct for multiple testing.

output_hits <- limma::topTable(fit2, coef = ncol(model_design), adjust.method = "BH", number = nrow(expression_mat))

# Then, we sort by pvalue
output_hits <- output_hits[order(output_hits$P.Value),]

(Answer to Q1) 2376 genes are differentially expressed pass the threshold p-value of 0.05. 0.05 was chosen as it is the most commonly used cutoff for significance, which signifies a 95% chance that differential expression would not be observed if the condition had no effect.

length(which(output_hits$P.Value < 0.05))
## [1] 2376

(Answer to Q2) On the other hand, 504 genes pass correction.

length(which(output_hits$adj.P.Val < 0.05))
## [1] 504


MA plots

(Answer to Q3) We can visualize the amount of differentially expressed genes using an MA plot as shown in Figure 5. limma’s plotMA function was used. Based on the plot, we can see that nearly all differentially expressed genes have an expression log-ratio and average log-ratio close to 0.
Figure 5: Differentially expressed genes are coloured and bolded in red.

Figure 5: Differentially expressed genes are coloured and bolded in red.


(Answer to Q3) As shown in Figure 6, we will now investigate RAD50 in more detail as the lncRNA cluster in humans overlap the RAD50 gene according to the paper.
Figure 6: We can see that the MA plot shows a high degree of confidence in RAD50 that it is differentially expressed and has high regulation.

Figure 6: We can see that the MA plot shows a high degree of confidence in RAD50 that it is differentially expressed and has high regulation.


Heatmap

(Answer to Q4) Next, we will create a heatmap to visualize upregulation and downregulation of the top hits in the data according to a colour scale. Based on Figure 7, we see that the conditions are clustered together due to the following groupings:

-TA-4 = TH1 and -TA-1 = TH1
-TA-2 = TH2 and -TA-5 = TH2
-TA-6 = TH17 and -TA-3 = TH17

Figure 7: Not only are the conditions clustered together, but their genes show similar patterns of upregulation and downregulation.

Figure 7: Not only are the conditions clustered together, but their genes show similar patterns of upregulation and downregulation.



Thresholded over-representation analysis

(Answer to Q1) Since we are focused on threshold calculation rather than rank calculation, options included DAVID, EnrichR, and g:Profiler. g:Profiler was used due to familiarity and because it is a popular tool for functional enrichment analysis and visualization of gene lists. g:Profiler was used through the gprofiler2 R package rather than through their web browser.


Create thresholded list of genes

I will start by creating a thresholded list of genes, by calculating their rank and then retrieving upregulated and downregulated genes that are significant (pval < 0.05).

output_hits[, "rank"] <- -log(output_hits$P.Value, base = 10) * sign(output_hits$logFC)
output_hits <- output_hits[order(output_hits$rank), ]

upregulated_genes <- rownames(output_hits)[which(output_hits$P.Value < 0.05 & 
                                                   output_hits$logFC > 0)]
downregulated_genes <- rownames(output_hits)[which(output_hits$P.Value < 0.05 & 
                                                     output_hits$logFC < 0)]
all_genes <- rownames(output_hits)

To present the results, I will store them in tables.

dir.create(paste0(getwd(), "/data"))

write.table(x = upregulated_genes,
            file = file.path("data", "GSE66261_upregulated_genes.txt"), sep = "\t",
            row.names = FALSE, col.names = FALSE, quote = FALSE)

write.table(x = downregulated_genes,
            file=file.path("data","GSE66261_downregulated_genes.txt"), sep = "\t",
            row.names = FALSE, col.names = FALSE, quote = FALSE)

write.table(x = data.frame(genename = rownames(output_hits), F_stat = output_hits$rank),
            file = file.path("data", "GSE66261_ranked_genelist.txt"), sep = "\t",
            row.names = FALSE, col.names = FALSE, quote = FALSE)

There are 1035 upregulated genes that are significant.

length(upregulated_genes)
## [1] 1035

On the other hand, there are 1341 upregulated genes that are significant.

length(downregulated_genes)
## [1] 1341


Gene set enrichment analysis using g:Profiler

(Answer to Q2) To maintain consistency with the differential expression analysis, Benjamini Hochberg FDR was used as the correction method for multiple hypothesis testing. Unfortunately, the corresponding paper for GSE66261 did not specify which annotation sources that they used for gene set enrichment.

GO:BP was used as gene ontology is a popular source for identifying relevant biological processes. In addition, KEGG, Reactome (REAC), and Wiki Pathways (WP) were all included as parameters due to a specific interest in learning the biological pathways associated with the genes of interest. Version 0.2.1 of the R package gprofiler2 was used, which is presumed to use the same version as the web browser tool. The web g:Profiler version is e105_eg52_p16_e84549, which was last updated on 03/01/2022.

gost_res <- gprofiler2::gost(all_genes, organism = "hsapiens", 
                             correction_method = c("fdr"),
                             user_threshold = 0.05,
                             sources = c("GO:BP", "KEGG", "REAC", "WP"))

(Answer to Q3) A p-value threshold of 0.05 was used as previously done for differential expression analysis. 3862 genesets were returned in total, with nearly 75% (2861 out of 3862) coming from GO:BP.

nrow(gost_res$result)
## [1] 3862
table(gost_res$result$source)
## 
## GO:BP  KEGG  REAC    WP 
##  2861   152   658   191

(Answer to Q3) If the p-value threshold is set to a more stringent value of 0.01, 3026 genesets are returned instead.

gost_res_stringent <- gprofiler2::gost(all_genes, organism = "hsapiens", 
                                       correction_method = c("fdr"),
                                       user_threshold = 0.01,
                                       sources = c("GO:BP", "KEGG", "REAC", "WP"))

nrow(gost_res_stringent$result)
## [1] 3026
table(gost_res_stringent$result$source)
## 
## GO:BP  KEGG  REAC    WP 
##  2208   122   559   137
Observing the plot function that gprofiler2 provides as shown in Figure 8, the term names whose -log10(p-adj) value is greater than 16 are the most significant and the ones that we are most interested in.

Figure 8: Despite the visualization, it is hard to gauge the number of term names that we are most interested in in due to the capping of values whose -log10(p-adj) > 16.


Upregulated vs downregulated g:Profiler analysis

(Answer to Q4) Next, we will run the g:Profiler analysis against the upregulated and downregulated genes both, with a threshold of 0.05.

gost_res_up <- gprofiler2::gost(upregulated_genes, organism = "hsapiens", 
                                correction_method = c("fdr"),
                                user_threshold = 0.05,
                                sources = c("GO:BP", "KEGG", "REAC", "WP"))
gost_res_down <- gprofiler2::gost(downregulated_genes, organism = "hsapiens", 
                                  correction_method = c("fdr"),
                                  user_threshold = 0.05,
                                  sources = c("GO:BP", "KEGG", "REAC", "WP"))


(Answer to Q4) Using all genes, we see that the term names that are most enriched are involved in metabolic and biosynthetic processes.

knitr::kable(gost_res$result$term_name[1:10])
x
cellular macromolecule metabolic process
organic substance biosynthetic process
organonitrogen compound metabolic process
biosynthetic process
cellular biosynthetic process
cellular protein metabolic process
regulation of cellular metabolic process
protein metabolic process
macromolecule biosynthetic process
regulation of primary metabolic process


(Answer to Q4) Using only upregulated genes that are significant, we see that the term names that are most enriched are still involved in metabolic and biosynthetic processes.

knitr::kable(gost_res_up$result$term_name[1:10])
x
regulation of cellular process
biological regulation
small molecule metabolic process
biosynthetic process
oxoacid metabolic process
organic acid metabolic process
organic substance biosynthetic process
carboxylic acid metabolic process
regulation of biological process
regulation of cellular metabolic process


(Answer to Q4) On the other hand, enrichment of downregulated genes that are significant show term names that are involved in immune response and regulation as well as response to stimuli. Based on the results using all genes, this shows that upregulated genes dominate the landscape by contributing more to differential expression and downstream pathways.

knitr::kable(gost_res_down$result$term_name[1:10])
x
immune system process
regulation of response to stimulus
response to stimulus
regulation of cellular process
positive regulation of biological process
immune response
cellular response to stimulus
biological process involved in interspecies interaction between organisms
intracellular signal transduction
response to stress



Interpretation

  1. Do the over-representation results support conclusions or mechanism discussed in the original paper?

Answer: Yes, they do. The original paper states that long non-coding RNAs (lncRNAs) play essential roles in arrays of cellular processes, which is consistent with the over-representation results of all genes and upregulated genes. The paper also states that differentiation of T helper cells is a critical step to adaptive immune response to pathogens, which is consistent with the over-representation results of downregulated genes.


  1. Can you find evidence, i.e. publications, to support some of the results that you see. How does this evidence support your results.

Answer: Yes, the paper “Long Non-coding RNAs: Major Regulators of Cell Stress in Cancer” by Connerty et al. explains that lncRNAs have diverse roles in cellular processes such as acting as epigenetic modulators and promoting or inhibiting transcription. On the other hand, the textbook “Molecular Biology of the Cell. 4th edition.” by Alberts et al. states that helper T cells are required for almost all adaptive immune responses.


Non-thresholded Gene set Enrichment Analysis

The ranked gene list was created in A2 and stored in the data folder as GSE66261_ranked_genelist.txt. The code below then converts it into a .rnk file.

GSE66261_ranked_genelist <- read.table("data/GSE66261_ranked_genelist.txt", sep = "\t")
colnames(GSE66261_ranked_genelist) <- c("GeneName", "rank")
write.table(GSE66261_ranked_genelist, file = "data/GSE66261_ranked_genelist.rnk", sep = "\t", row.names = FALSE, quote = FALSE)

1. What method did you use? What genesets did you use? Make sure to specify versions and cite your methods.

I used GSEA version 4.2.3 and ran the GSEA preranked analysis. The geneset used was the GO biological pathways file curated by the Bader Lab, and downloaded using the code below. The geneset is dated April 1, 2022.

data_dir <- paste0(getwd(), "/")

gmt_url = "http://download.baderlab.org/EM_Genesets/current_release/Human/symbol/"
# list all the files on the server
filenames = RCurl::getURL(gmt_url)
tc = textConnection(filenames)
contents = readLines(tc)
close(tc)
# get the gmt that has all the pathways and does not include terms inferred
# from electronic annotations(IEA) start with gmt file that has pathways only
rx = gregexpr("(?<=<a href=\")(.*.GOBP_AllPathways_no_GO_iea.*.)(.gmt)(?=\">)", contents,
              perl = TRUE)
gmt_file = unlist(regmatches(contents, rx))
dest_gmt_file <- file.path(data_dir, gmt_file)
download.file(paste(gmt_url, gmt_file, sep = ""), destfile = dest_gmt_file)

When running GSEA, the following parameters were used:

  • Number of permutations: 1000
  • Maximum geneset size: 200
  • Minimum geneset size: 15
  • Collapse/Remap to gene symbols: No_Collapse


2. Summarize your enrichment results.

The positive phenotype in this study are genes that are upregulated when exposed to TH1, TH2, and TH17 conditions, while the negative phenotype represents genes that are downregulated when exposed to the same conditions.

With a nominal pvalue < 5%, 76 gene sets were significantly upregulated in the positive phenotype as shown in Figure 1. Top gene sets enriched for terms such as metabolic processes, cell differentiation, and development. The top gene of the most enriched gene set is TP53I3, whose translated protein is involved in cellular responses to oxidative stresses according to Gene Cards.

On the other hand, using the same pvalue cutoff, 1070 gene sets were significantly upregulated in the negative phenotype as shown in Figure 1. Top gene sets enriched primarily for immune response and response to viruses. The top gene of the most enriched gene set is IDO1, whose translated protein plays a role in antimicrobial and antitumor defense according to Gene Cards.

Figure 1: GSEA results


3. How do these results compare to the results from the thresholded analysis in Assignment #2. Compare qualitatively. Is this a straight forward comparison? Why or why not?

The thresholded analysis using g:Profiler on upregulated genes shows term names that are specific to metabolic and biosynthetic processes. Likewise, the top gene set in the positive phenotype for GSEA is related to metabolic process. However, the GSEA results also showed term names such as different types of development, morphogenesis, and cell differentiation as shown in Figure 2. This indicates that the results as a whole aren’t similar.

Figure 2: GSEA - Gene set term names for positive phenotype


On the other hand, the thresholded analysis using g:Profiler on downregulated genes shows term names such as immune response and regulation as well as response to stimuli. Gene sets returned by GSEA for the negative phenotype are also enriched for terms related to immune response. However, the response types are more diverse such as interferon gamma and TNFA signalling as shown in Figure 3.

Figure 3: GSEA - Gene set term names for negative phenotype


Despite some similarities between the g:Profiler results and GSEA results, this is not a straightforward comparison. This is because the thresholded analysis only looked at a subset of genes that were significant, while the non-thresholded analysis used all genes. This is evident in the terminology returned by GSEA, with a more diverse and unique set of term names to reflect all the genes.


Visualization of Gene set Enrichment Analysis in Cytoscape

Cytoscape 3.9.1 was used with EnrichmentMap installed through the App Manager.

1. Create an enrichment map - how many nodes and how many edges in the resulting map? What thresholds were used to create this map? Make sure to record all thresholds. Include a screenshot of your network prior to manual layout.

An enrichment map was created by scanning the GSEA output folder for enrichment data. Through the “Analyze Network” tool, it is revealed that there are 651 nodes and 1639 edges as shown in Figure 4.

Figure 4: Summary statistics of the enrichment map, which includes the number of nodes and edges.

When creating the map, thresholds included a node cutoff of pval = 0.05, and an edge cutoff of pval = 0.375. As shown in Figure 5, all nodes are coloured blue, which indicates that only downregulated genes passed the threshold when mapping.

Figure 5: Initial network after enrichment mapping


2. Annotate your network - what parameters did you use to annotate the network. If you are using the default parameters make sure to list them as well.

The network was annotated using the AutoAnnotate tool available in Cytoscape. The following parameters were used:

  • Annotate entire network
  • Use clusterMaker App
  • Label column: GS_DESCR
  • Cluster algorithm: MCL Cluster
  • Edge weight column: similarity_coefficient


3. Make a publication ready figure - include this figure with proper legends in your notebook.

Figure 6: A publication ready figure of the non-annotated network

Figure 7: A publication ready figure of the annotated network

Figure 8: Corresponding legend of both networks


4. Collapse your network to a theme network. What are the major themes present in this analysis? Do they fit with the model? Are there any novel pathways or themes?

The network was collapsed to a theme network using AutoAnnotate’s coSE Cluster Layout, as shown in Figure 9. Major themes are represented by large clusters of gene sets/nodes with a high number of connections/edges, with two being particularly relevant:

  • Molecular immuno adaptive and cd4 alpha differentiation
  • Cascade initiated toll and dectin downstream kb


Since all gene sets seem to contain downregulated genes only, both themes ultimately line up with the model and GSEA results as they are both implicated in immune response. However, the annotations reveal the specific themes that result in the immune response and allow for further investigation of their pathways.

Figure 9: Collapsed theme network


Interpretation

1. Do the enrichment results support conclusions or mechanism discussed in the original paper? How do these results differ from the results you got from Assignment #2 thresholded methods?

The enrichment results do support the conclusions of the original paper. Specifically, the “Molecular immuno adaptive” and “cd4 alpha differentiation” annotations highlight the differentiation of naive T helper cells (also known as CD4+ T cells) into effector T helper subsets through TH1, TH2, and TH17 polarizing conditions. The paper mentions that this is a critical step for adaptive immune response to pathogens.

The enrichment results differ from the A2 results as the GSEA results and subsequent annotation provide more information than simply immune response, which is too general and broad. Rather, the second set of annotations “Cascade initiated toll” and “dectin downstream kb” specify the signalling pathways at play during the adaptive immune response.


2. Can you find evidence, i.e. publications, to support some of the results that you see. How does this evidence support your result?

Yes, the paper “Toll-like receptor 4 signaling in T cells promotes autoimmune inflammation” by Reynolds et al. specifies that TLR4 is expressed in CD4+ T cells. In addition, the paper “Dectin-1 Signaling Update: New Perspectives for Trained Immunity” by Mata-Martinez et al. states that Dectin-1 signalling ignites the differentiation of naive CD4+ T cells to a TH1 or TH17 phenotype.


3. Transcription factors post analysis

A post analysis to the network was done to analyze 4 transcription factors which were determined to be key for governing cell fate decisions according to the paper. Specifically, they play a role in the selective activation and silencing of genes encoding lineage specific cytokines. The transcription factors are:

  • T-bet/TBX21 for TH1 condition
  • GATA3 for TH2 condition
  • ROR-γt/RORC and BATF for TH17 condition

The MSigDB human transcription factors file from download.baderlab.org (Human_TranscriptionFactors_MSigdb_April_01_2022_symbol.gmt) was initially used. It was then filtered to only include gene sets containing one of the 4 transcription factors mentioned. This code is provided in the journal entry associated with this assignment. The filtered signature gene set contains 99 gene sets in total, which was fed into Enrichment Map. A Mann-Whitley two-sided test was implemented for the post analysis. A cutoff of 0.0001 was used to limit the number of gene sets analyzed and to improve runtime. This led to 10 gene sets being fed into the analysis, as shown in Figure 10.

Figure 10: Signature gene sets selected for post analysis


Looking at a zoomed image of the annotated enrichment map after post analysis, we see that TGGAAA_NFAT_Q4_01 is highly associated with the two main clusters of molecular immuno adaptive and cascade initiated toll. Upon further search, the paper “NFAT, immunity and cancer: a transcription factor comes of age” by Martin R. Müller & Anjana Rao reveals that the class of NFAT transcription factors are of primary importance during T cell activation and differentiation. However, the paper fails to mention the expression of NFAT when exposed to TH1, TH2, and TH17 conditions. Thus, there should be a follow up with the authors of the paper by revealing the results of the post analysis and asking whether NFAT was examined in detail.

Figure 11: Enrichment map with annotations after the post analysis

Figure 12: Zoomed in enrichment map with annotations after the post analysis



References

Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K., & Walter, P. (2002). Molecular biology of the cell. 4th edition. Garland Science.
Connerty, P., Lock, R. B., & Bock, C. E. de. (2020). Long non-coding RNAs: Major regulators of cell stress in cancer. Frontiers in Oncology, 10. https://doi.org/10.3389/fonc.2020.00285
Davis, S., & Meltzer, P. (2007). GEOquery: A bridge between the gene expression omnibus (GEO) and BioConductor. Bioinformatics (Oxford, England), 14, 1846–1847. https://doi.org/10.1093/bioinformatics/btm254
Gu, Z., Eils, R., & Schlesner, M. (2016). Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics (Oxford, England), 32(18), 2847–2849. https://doi.org/10.1093/bioinformatics/btw313
Gu, Z., Gu, L., Eils, R., Schlesner, M., & Brors, B. (2014). Circlize implements and enhances circular visualization in r. Bioinformatics (Oxford, England), 30(19), 2811–2812. https://doi.org/10.1093/bioinformatics/btu393
Isserlin, R. (2022a). BCB420: Lecture 10 - recap and GSEA.
Isserlin, R. (2022b). BCB420: Lecture 11 - Cytoscape101.
Isserlin, R. (2022c). BCB420: Lecture 12 - enrichment map and other cytoscape apps.
Isserlin, R. (2022d). BCB420: Lecture 6 - differential expression.
Isserlin, R. (2022e). BCB420: Lecture 7 - annotation dataset and intro to pathway analysis.
Kucera, M., Isserlin, R., Arkhangorodsky, A., & Bader, G. D. (2016). AutoAnnotate: A cytoscape app for summarizing networks with semantic annotations. F1000Research, 5(1717). https://doi.org/10.12688/f1000research.9090.1
Mata-Martínez, P., Bergón-Gutiérrez, M., & Fresno, C. del. (2022). Dectin-1 signaling update: New perspectives for trained immunity. Frontiers in Immunology, 13. https://doi.org/10.3389/fimmu.2022.812148
Merico, D., Isserlin, R., Stueker, O., Emili, A., & Bader, G. D. (2010). Enrichment map: A network-based method for gene-set enrichment visualization and interpretation. PLoS One, 5(11). https://doi.org/10.1371/journal.pone.0013984
Morgan, M. (2021). BiocManager: Access the bioconductor project package repository. https://CRAN.R-project.org/package=BiocManager
Müller, M. R., & Rao, A. (2010). NFAT, immunity and cancer: A transcription factor comes of age. Nat Rev Immunol, 10, 645–656. https://doi.org/10.1038/nri2818
Raudvere, U., Kolberg, L., Kuzmin, I., Arak, T., Adler, P., Peterson, H., & Vilo, J. (2019). g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Research, 47(W1), W191–W198. https://doi.org/10.1093/nar/gkz369
Reynolds, J. M., Martinez, G. J., Chung, Y., & Dong, C. (2012). Toll-like receptor 4 signaling in t cells promotes autoimmune inflammation. Proceedings of the National Academy of Sciences of the United States of America, 109(32), 13064–13069. https://doi.org/10.1073/pnas.1120585109
Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., & Smyth, G. K. (2015). Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7), 47. https://doi.org/10.1093/nar/gkv007
Robinson, M. D., McCarthy, D. J., & Smyth, G. K. (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics (Oxford, England), 26(1), 139–140. https://doi.org/10.1093/bioinformatics/btp616
Shannnon, P., Markiel, A., Owen, O., Baliga, N. S., Wang, J. T., Ramage, D., Amin, N., Schwikowski, B., & Ideker, T. (2003). Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research, 13(11), 2498–2504. https://doi.org/10.1101/gr.1239303
Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S., & Mesirov, J. P. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences, 102(43), 15545–15550. https://doi.org/10.1073/pnas.0506580102